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Abstract. Nicod’s criterion states that observing a black raven is ev¬ 
idence for the hypothesis H that all ravens are black. We show that 
Solomonoff induction does not satisfy Nicod’s criterion: there are time 
steps in which observing black ravens decreases the belief in H. More¬ 
over, while observing any computable inhnite string compatible with 
H, the belief in H decreases infinitely often when using the unnormal¬ 
ized Solomonoff prior, but only hnitely often when using the normalized 
Solomonoff prior. We argue that the fault is not with Solomonoff induc¬ 
tion; instead we should reject Nicod’s criterion. 


Keywords: Bayesian reasoning, confirmation, disconfirmation, Hempel’s para¬ 
dox, equivalence condition, Solomonoff normalization. 

1 Introduction 

Inductive inference, how to generalize from examples, is the cornerstone of sci¬ 
entific investigation. But we cannot justify the use of induction on the grounds 
that it has reliably worked before, because this argument presupposes induction. 
Instead, we need to give deductive (logical) arguments for the use of induction. 
Today we know a formal solution to the problem of induction: Solomonoff’s the¬ 
ory of learning mm, also known as universal induction or Solomonoff induc¬ 
tion. It is a method of induction based on Bayesian inference [9] and algorithmic 
probability m- Because it is solidly founded in abstract mathematics, it can be 
justified purely deductively. 

Solomonoff defines a prior probability distribution M that assigns to a string 
X the probability that a universal monotone Turing machine prints something 
starting with x when fed with fair coin flips. Solomonoff’s prior encompasses 
Ockham’s razor by favoring simple explanations over complex ones: algorithmi¬ 
cally simple strings have short programs and are thus assigned higher probability 
than complex strings that do not have short programs. Moreover, Solomonoff’s 
prior respects Epicurus’ principle of multiple explanation by never discarding 
possible explanations: any possible program that explains the string contributes 
to the probability [8j. 

For data drawn from a computable probability distribution /x, Solomonoff 
induction will converge to the correct belief about any hypothesis [T]. Moreover, 
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this can be used to produce reliable predictions extremely fast: Solomonoff in¬ 
duction will make a total of at most E + 0{V~E) errors when predicting the 
next data points, where E is the number of errors of the informed predictor 
that knows /i [7]. In this sense, Solomonoff induction solves the induction prob¬ 
lem m- It is incomputable, hence it can only serve as an ideal that any practical 
learning algorithm should strive to approximate. 

But does Solomonoff induction live up to this ideal? Suppose we entertain 
the hypothesis El that all ravens are black. Since this is a universally quantified 
statement, it is refuted by observing one counterexample: a non-black raven. 
But at any time step, we have observed only a finite number of the potentially 
infinite number of possible cases. Nevertheless, Solomonoff induction maximally 
confirms the hypothesis E[ asymptotically. 

This paper is motivated by a problem of inductive inference extensively dis¬ 
cussed in the literature: the paradox of confirmation, also known as Hempel’s 
paradox [ 3 . It relies on the following three principles. 

— Nicod’s criterion |14[ p. 67]: observing an E that is a G increases our belief 
in the hypothesis that all Fs are Gs. 

— The equivalence condition: logically equivalent hypothesis are confirmed or 
disconfirmed by the same evidence. 

— The paradoxical conclusion: a green apple confirms H. 


The argument goes as follows. The hypothesis H is logically equivalent to the 
hypothesis H' that all non-black objects are non-ravens. According to Nicod’s 
criterion, any non-black non-raven, such as a green apple, confirms H'. But then 
the equivalence condition entails the paradoxical conclusion. 

The paradox of confirmation has been discussed extensively in the literature 
on the philosophy of science |bl 2 ll 2 l:il 6 ll:-!lfn) : see [IH] for a survey. Support for 
Nicod’s criterion is not uncommon p m and no consensus is in sight. 

Using results from algorithmic information theory we show that Solomonoff 
induction avoids the paradoxical conclusion because it does not fulfill Nicod’s 
criterion. There are time steps when (counterfactually) observing a black raven 


disconfirms the hypothesis that all ravens are black (Theorem 7 and Corol¬ 


lary 12). In the deterministic setting Nicod’s criterion is even violated infinitely 
often (Theorem 8 and Corollary 13). However, if we normalize Solomonoff’s 
prior and observe a deterministic computable infinite string, Nicod’s criterion is 
violated at most finitely many times (Theorem 111. Our results are independent 
of the choice of the universal Turing machine. A list of notation can be found 
on [page 15| 


2 Preliminaries 

Let X be some finite set called alphabet. The set X* := is the set of all 

finite strings over the alphabet X, and the set X°° is the set of all infinite strings 
over the alphabet X. The empty string is denoted by e, not to be confused with 
the small positive rational number e. Given a string x € X*, we denote its length 



















by |a;|. For a (finite or infinite) string x of length > fc, we denote with xi-,k the 
first k characters of x, and with x<:k the first k—1 characters of x. The notation 
Xi-ao stresses that x is an infinite string. We write a; C ?/ iff a: is a prefix of y, 
i.e., X = 

A semimeasure over the alphabet A is a probability measure on the proba¬ 
bility space := X* U X°° whose cr-algebra is generated by the cylinder sets 
7A := {xz\z& X^ dU Ch. 4.2], If a semimeasure assigns zero probability to 
every finite string, then it is called a measure. Measures and semimeasures are 
uniquely defined by their values on cylinder sets. For convenience we identify a 
string X G X* with its cylinder set 

For two functions f,g : X* —)■ M we use the notation / > 5 iff there is a 
constant c > 0 such that f{x) > cg{x) for all x G X*. Moreover, we define f < g 
iff 5 > / and we define f = g iS. f < g and f > g. Note that f = g does not 
imply that there is a constant c such that f{x) = cg(x) for all x. 

Let U denote some universal Turing machine. The Kolmogorov complexity 
K{x) of a string x is the length of the shortest program on U that prints x 
and then halts. A string x is incompressible iff K{x) > |a;|. We define m{t) := 
min„>t Ar(n), the monotone lower bound on K. Note that m grows slower than 
any unbounded computable function. (Its inverse is a version of the busy beaver 
function.) We also use the same machine U as a monotone Turing machine by 
ignoring the halting state and using a write-only output tape. The monotone 
Kolmogorov complexity Km{x) denotes the length of the shortest program on 
the monotone machine U that prints a string starting with x. Since monotone 
complexity does not require the machine to halt, there is a constant c such that 
Km{x) < K{x) -\- c for all x G X*. 

Solomonoff’s prior M m is defined as the probability that the universal 
monotone Turing machine computes a string when fed with fair coin flips in the 
input tape. Formally, 

M{x) := 

p: xQU(p) 


Equivalently, the Solomonoff prior M can be defined as a mixture over all lower 
semicomputable semimeasures |20j . 

The function M is a lower semicomputable semimeasure, but not computable 
and not a measure m Lem. 4.5.3]. It can be turned into a measure Mnorm using 
Solomonoff normalization im Sec. 4.5.3]: Mnorm(e) := 1 and for all x G and 

Qj G 


-^norm(^a) I— '^norm(^) 


M{xa) 


EbexMixb) 


( 1 ) 


since M{x) > 0 for all x G X*. 

Every program contributes to Af, so we have that M{x) > However, 

the upper bound M{x) < is generally false [1]. Instead, the following 

weaker statement holds. 



Lemma 1 f |10j as cited in m p- 75]). Let E C X* be a recursively enu¬ 
merable and prefix-free set. Then there is a eonstant c^; S N such that M{x) < 

2-Jfm(x)+CE fj^ll j;(Z E. 


Proof. Define 


iy{x) 


M{x), it X £ E, and 
0, otherwise. 


The semimeasure v is lower semicomputable because E is recursively enumerable. 
Furthermore, 'Yhx&x* — 1 because M is a semimeasure and E is prefix-free. 
Therefore ^ is a discrete semimeasure. Hence there are constant c and d such 
that Km{x) < K{x) -\- c < — log ^{x) -I- c -I- c' = — log M(x) -\- c-\- c' [TTl Cor. 
4.3.1]. □ 


Lemma 2 im Sec. 4.5.7]). For any computable measure p, the set of p- 
Martin-Lof-random sequenees has p-probability one: 


p{{x £ X°° I 3cVt. M{x\.,t) < cp{x\.,t)}) = 1. 


3 Solomonoff and the Black Ravens 

Setup. In order to formalize the black raven problem (in line with [151 Sec. 7.4]), 
we define two predicates: blackness B and ravenness R. There are four possible 
observations: a black raven BR, a non-black raven BR, a black non-raven BR, 
and a non-black non-raven BR. Therefore our alphabet consists of four symbols 
corresponding to each of the possible observations, X := {BR, BR, BR, BR}. We 
will not make the formal distinction between observations and the symbols that 
represent them, and simply use both interchangeably. 

We are interested in the hypothesis ‘all ravens are black’. Formally, it corre¬ 
sponds to the set 

H -.= {x£X^ Ixtj^BRyt} = {BR,BR,BR}\ (2) 

the set of all finite and infinite strings in which the symbol BR does not occur. 
Let E[‘^ := X^ \ H he the complement hypothesis ‘there is at least one non-black 
raven’. We fix the definition of El and H‘^ for the rest of this paper. 

Using Solomonoff induction, our prior belief in the hypothesis El is 

M{H) = 2”'^', 

p: U(p)eH 

the cumulative weight of all programs that do not print any non-black ravens. 
In each time step t, we make one observation xt € X. Our history x<t = 
xiX 2 ...Xt-i is the sequence of all previous observations. We update our be¬ 
lief with Bayes’ rule in accordance with the Bayesian framework for learning [9]: 
our posterior belief in the hypothesis El is 

M{Hr^xi..t) 

M{xi.,t) 


M{H I xi,t) 



We say that the observation xt confirms the hypothesis H iS M{H \ xi:t) > 
M{H I a;<t) (the belief in H increases), and we say that the observation xt 
disconfirms the hypothesis H iff M{H \ Xi-t) < M{H \ x^t) (the belief in H 
decreases), li M{H \ xi-t) = 0, we say that H is refuted, and ii M{H \ xi-,t) —t 1 
as t ^ oo, we say that H is (maximally) confirmed asymptotieally. 

Confirmation and Refutation. Let the sequence xi:oc, be sampled from a com¬ 
putable measure /x, the true environment. If we observe a non-black raven, 
Xt = BR, the hypothesis H is refuted since H n xi.,t = 0 and this implies 
M{H I xi.,t) = 0. In this case, our enquiry regarding H is settled. For the rest of 
this paper, we focus on the interesting case: we assume our hypothesis H is in 
fact true in p, = 1), i.e., /x does not generate any non-black ravens. Since 

Solomonoff’s prior M dominates all computable measures, there is a constant 
such that 

Va: S X* M{x) > w^pL^x). (3) 

Thus Blackwell and Dubins’ famous merging of opinions theorem [T] implies 

M{H I xi.,t) —>■ 1 as t —)■ 00 with ^-probability oneQ (4) 

Therefore our hypothesis H is confirmed asymptotically m Sec. 7.4]. However, 
convergence to 1 is extremely slow, slower than any unbounded computable 
function, since 1 — M{H \ xi.,t) > 2 “'"^) for all t. 

In our setup, the equivalence condition holds trivially: a logically equivalent 
way of formulating a hypothesis yields the same set of infinite strings, therefore in 
our formalization it constitutes the same hypothesis. The central question of this 
paper is Nicod’s criterion, which refers to the assertion that BR and BR confirm 
H, i.e., M{H I xi,tBR) > M{H \ x<t) and M{H \ xi..tlM) > M{H \ x<t) for all 
strings x^f 


4 Disconfirming H 

We first illustrate the violation of Nicod’s criterion by defining a particular uni¬ 
versal Turing machine. 

Example 3 (Black Raven Disconfirms). The observation of a black raven can 
falsify a short program that supported the hypothesis H. Let e > 0 be a small 
rational number. We define a semimeasure p as follows. 

p{BR°°) := 4 p{BR°°) := ^ p{BR BR°°) := 4 — e p(x) := 0 otherwise. 

^ Blackwell-Dubins’ theorem refers to (probability) measures, but technically M is 
a semimeasure. However, we can view M as a measure by introducing an extra 
symbol to our alphabet [IIlp.264]. This preserves dominance ^ , and hence absolute 
continuity, which is the precondition for Blackwell-Dubins’ theorem. 
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a^X 


Fig. 1 : The definitions of the values A, B, C, D, and E. Note that by assumption, 
x^t does not contain non-black ravens, therefore M({a;<t} n H‘^) = M{%) = 0. 


To get a universally dominant semimeasure we mix p with the universally 
dominant semimeasure M. 


^{x) := p{x) + eM{x). 

For computable e, the mixture ^ is a lower semicomputable semimeasure. Hence 
there is a universal monotone Turing machine whose Solomonoff prior is equal 
to ^ [201 Lem. 13]. Our a priori belief in H at time t = 0 is 

I e) = ^{H) > p(BR°°) + p{BR^) = 75%, 


while our a posteriori belief in H after seeing a black raven is 


m I BR) = 


^{Hn BR) 

~Km~ 


^ p{BR°^)+e_ 

~ p{BR°°) + p{BRBR°°) 


< 75% 


for e < 7%. Hence observing a black raven in the first time step disconfirms the 
hypothesis H. 0 


The rest of this section is dedicated to show that this effect occurs indepen¬ 
dent of the universal Turing machine U and on all computable infinite strings. 


4.1 Setup 

At time step t, we have seen the history x^t and now update our belief using 
the new symbol Xt- To understand what happens, we split all possible programs 
into five categories. 

(a) Programs that never print non-black ravens (compatible with 77), but be¬ 
come falsified at time step t because they print a symbol other than Xt- 

(b) Programs that eventually print a non-black raven (contradict H), but be¬ 
come falsified at time step t because they print a symbol other than xt- 

(c) Programs that never print non-black ravens (compatible with 77), and pre¬ 
dict Xt correctly. 







(d) Programs that eventually print a non-black raven (contradict H), and pre¬ 
dict xt correctly. 

(e) Programs that do not print additional symbols after printing x^t (because 
they go into an infinite loop). 

Let A, B, C, D, and E denote the cumulative contributions of these five cate¬ 
gories of programs to M. A formal definition is given in [Figure 1[ and implicitly 
depends on the current time step t and the observed string xi:t. The values of A, 
B, C, D, and E are in the interval [0,1] since they are probabilities. Moreover, 
the following holds. 


M{x<^t) — A-\-B-\-CaE-\-E 


M{x<t^H) = A + C + E 


M{H I x<t) 


A + C + E 
A + B + C + D + E 


M{xi,t)=C + D (5) 

M{xi,t f^H)=C (6) 

M{H I x,..t) = (7) 


We use results from algorithmic information theory to derive bounds on A, 
B, C, D, and E. This lets us apply the following lemma which states a necessary 
and sufficient condition for confirmation/disconfirmation at time step t. 


Lemma 4 (Confirmation Criterion). Observing Xt confirms (disconfirms) 
the hypothesis El if and only if AD + DE < BC (AD + DE > BC). 


Proof. The hypothesis is confirmed if and only if 

t 0 _ A+C+E _ BC-AD-DE 

iviyn I Xi.tj iviyn \ x^t) — c+D A+B+C+D+E ~ {A+B+C+D+E){C+D) 

is positive. Since the denominator is positive, this is equivalent to BC > AD + 
DE. □ 


Example 5 (Confirmation Criterion Applied to Example 3). In Example 3 


we 


picked a particular universal prior and xi = BR. In this case, the values for A, 
B, C, D, and E are 

7?G[0,e] C€[\,\+e\ G — e, |] EG[0,e]. 


We invoke Lemma 4 with e := 7% to get that Xi = BR disconfirms H: 


AD + DE > I - ^ = 0.09 > 0.0224 =l+e^>BC. 


0 


Lemma 6 (Bounds on ABCDE). Let xi.,ao G H be some computable infinite 
string. The following statements hold for every time step t. 


(i) Q < A,B,C,D,E <l 
(li) A + B< 

(Hi) A,B> 

(iv) C > 1 


(v) D > 

(vi) D ^ 0 as t ^ oo 

(vii) E —>■ 0 as t —>■ oo 












Proof. Let p be a program that computes the infinite string Xi-ao- 


(i) Each of A, B, C, D,E is a probability value and hence bounded between 
0 and 1. These bounds are strict because for any finite string there is a 
program that prints that string. 

(ii) A proof is given in the appendix of [8] . Let a Xt and let q be the shortest 
program for the string a:<(a, i.e., jgl = Km{x^to). We can reconstruct t by 
running p and q in parallel and counting the number of characters printed 
until their output differs. Therefore there is a constant c independent of t 
such that K{t) < \p\ + [gj + c = \p\ + Km{x^ta) + c. Hence 

2-Km(x^ta) < 2~^^A + \p\+c /g-j 


The set E := {x<ta | t £ N, a ^ Xt\ is recursively enumerable and prefix- 
free, so Lemma T] yields a constant ce such that 


M{x<ta) < 


2-K(t) + \p\+C+CE 


With A + B < {ffX — 1) maxa^Kj M{x^ta) follows the claim. 

(iii) Let a Xt and let q be the shortest program to compute t, i.e., jgl = K{t). 
We can construct a program that prints x^t^BR by first running q to get 
t and then running p until it has produced a string of length t — 1, and 
then printing aBR. Hence there is a constant c independent of t such that 
Km{x<^taBR) < |g| -|- |p| -|- c = Ar(f) -I- \p\ + c. Therefore 

M{x<tar\H^) > M{x.^taBR) > 


For the bound on M{x^tar\H) we proceed analogously except that instead 
of printing BR the program goes into an infinite loop. 

(iv) Since by assumption the program p computes xi-oo G we have that 
M{xi,tnH) > 2-|pl. 

(v) Let n be an integer such that K{n) = m{t). We proceed analogously to 
( pli| with a program q that prints n such that jgl = m(t). Next, we write 
a program that produces the output xi-nBR, which yields a constant c 
independent of t such that 

M{xi..t n H^) > M{xx,nBR) > > 2-l9l-bl-'= = 2 -™(‘)-IpI-c^ 

(vi) This follows from Blackwell and Dubins’ result Q: 

D = {C + D){i- < (1 + 1)(1 - M{H I Xi,t)) ^ 0 as t ^ 00 . 

(vii) ^{{x<t}) = Af({a:<t | t £ N}) < 1, thus E = M({a;<J) 0. □ 


[Lemma 6| states the boun ds that illustrate the ideas to our results informally: 
From A = B = 2 “^^) | and C = 1 ( p^ we get 


BC = 


AD A 









According to [Lemma 4[ the sign of AD + DE — BC tells us whether our belief 
in H increases (negative) or decreases (positive). 

Since D ^ 0 the term AD = will eventually be smaller than 

BC = Therefore it is crucial how fast E —)■ 0 (vii). If we use M, then 

E ^ 0 slower than Z? —)■ 0 Q, therefore AD+DE—BC is positive infinitely often 
(Theorem 8). If we use Mnorm instead of M, then E = 0 and hence AD + DE — 
BC = AD — BC is negative except for a finite number of steps (Theorem 11). 


4.2 Unnormalized Solomonoff Prior 


Theorem 7 (Counterfactual Black Raven Disconfirms H). Let Xi-^o be 
a computable infinite string such that xi-,oo G H (xi-oo does not contain any 
non-black ravens) and Xt ^ BR infinitely often. Then there is a time step t gN 
(with Xt BR) such that M{H \ x<^tBR) < M{H \ x<t). 


Proof. Let t be time step such that Xt BR. From the proof of Lemma 6 ( ^ 
we get M{H^ n x^tBR) > and thus 


M{H I x<tBR) < 


M{H n x<tBR) + M{H‘^ n x^tBR) - 


M {x<itBR) 


= 1 - 


2-K{t)-c 

M{x^tBR) 


< 1 - 


2-K(t)-c 


A + B 


- 2 “ 


From ([^ there is a to such that for all t > to we have M{H \ x^t) > 1 — 2“'^“'^ > 
M{H I x<ctBR). Since Xt ^ BR infinitely often according to the assumption, 
there is a a;* ^ BR for t > to. □ 


Note that the black raven in (Theorem 71 that we observe at time t is coun¬ 
terfactual, i.e., not part of the sequence Xi,oo- If we picke d the binary alphabet 
{BR, BR} and denoted only observations of ravens, then Theorem 7 would not 
apply: the only infinite string in H is BR°° and the only counterfactual observa¬ 
tion is BR, which immediately falsifies the hypothesis H. The following theorem 
gives an on-sequence result. 


Theorem 8 (Disconfirmation Infinitely Often for M). Let x 1,00 be a com¬ 
putable infinite string such that xiioo S H (a;i:oo does not contain any non-black 
ravens). Then M{H \ xi.,t) < M{H \ x<t) for infinitely many time steps t G N. 


Proof. We show that there are infinitely many n S N such that for each n there 
is a time step t > n where the belief in H decreases. The ns are picked to have 
low Kolmogorov complexity, while the ts are incompressible. The crucial insight 
is that a program that goes into an infinite loop at time t only needs to know n 
and not t, thus making this program much smaller than K(t) > logt. 

Let Qn be a program that starting with t = n -\-1 incrementally outputs xi:t 
as long as K{t) < logt. Formally, let (j){y, k) be a computable function such that 
(j){y, fc -b 1) < k) and limfe^oo fiiy, k) = K{y). 















program (/„ : 

t := n + 1 
output a;<t 
while true : 

k := 0 

w hile k) > logt: 

k := k + 1 
output Xt 

t t -)- 1 


The program only needs to know p and n, so we have that l^nl < K{n) + c for 
some constant c independent of n and t. For the smallest t > n with K{t) > logt, 
the program qn will go into an infinite loop and thus fail to print a t-th character. 
Therefore 

E = M{{x<t}) > (9) 

Incompressible numbers are very dense, and a simple counting argument 
shows that there must be one between n and 4n m Thm. 3.3.1 (i)]. Furthermore, 
we can assume that n is large enough such that m(4n) < m{n) +1 (since m grows 
slower than the logarithm). Then 

m{t) < m{4n) < m(n) + 1 < K{n) + 1. (10) 


Since the function m grows slower than any unbounded computable function, 
we find infinitely many n such that 


K{n) < ^{logn — c — c' — c" — 1), 


( 11 ) 


where c' and c" are the constants from Lemma 6 ( |ii|v ). For each such n, there 
is a < > n with K{t) > logt, as discussed above. This entails 


m{t) + K{n) + c + c" S 2K(n) + l + c + c" S log n — c' < logt —c' < K{t) — c'. 

( 12 ) 

From [Lemma 6|we get 


AD + DE^DE^.^ 


— c— K (n) — c" 2 -K(t)+c' 


BC. 


With [Lemma 4| we conclude that Xt disconfirms H. □ 

To get that M violates Nicod’s criterion inhnitely often, we apply [Theorem 8 
to the computable infinite string BR°°. 


4.3 Normalized SolomonofF Prior 


In this section we show that for computable infinite strings, our belief in the 
hypothesis H is non-increasing at most finitely many times if we normalize M. 

For this section we define A!, B' ^ C", D\ and E' analogous to A, B, C, D, 
and E as given in Figure 1 with Mnorm instead of M. 












Lemma 9 (Mnorm > M). Mnorm(a^) > M{x) for all x £ X*. 

Proof. We use induction on the length of x: Mnorm(e) = 1 = M(e) and 


MnoTm{x)M{xa) M{x)M{xa) M{x)M{xa) 

Mnormyxa) — _ , . > _ , , > , , — M{xa). 

Ebex ^( 2 ^^) Ebex 

The first inequality holds by induction hypothesis and the second inequality uses 
the fact that M is a semimeasure. □ 


The following lemma states the same bounds for Mnorm as given in |Lemma~ 6 | 
except for ([i|) and (R^. 


Lemma 10 (Bounds on A'B'C'D'E'). Let xi.,ao G H be some infinite string 
computed by program p. The following statements hold for all time steps t. 


(i) A<A’,B< B', 
C<C', D<D' 

(ii) AL + B' < 2-^W 
(ili) A',B' > 


(iv) C > 1 

(v) D' > 

(vi) 0 as t —>■ 00 

(vii) if' = 0 


Proof. (i) Follows from Lemma 9 
(ii) Let a Xt. From Lemma 6 


Mn 


g M, 


(ii) we have M{x^ta) < 2 Thus 
(x<t) 2 -^W 


1 {xct)M{xcta) ^ M, 


Eb^x ^{x<tb) 


Eb&xM{x<tb) 


< 2 -^«. 



Mn, 


i({x<(}) = 0, hence E' = 0. 


□ 


Theorem 11 (Disconfirmation Finitely Often for Mnorm)- Lst Xi:oo be 
a computable infinite string such that Xi.,ao G H (xi-oo does not contain any 
non-black ravens). Then there is a time step to such that Mnorm(M | xi:t) > 
MnormiH \ x<t) for all t > to. 


Intuitively, at time step tg, Mnorm has learned that it is observing the in¬ 
finite string cciioo and there are no short programs remaining that support the 
hypothesis H but predict something other than a;i:oo. 


Proof. We use Lemma 10 ( ii|iii|iv|vii ) to conclude 


A'D'+D'E'-B'C < < 2-^(‘)+'=(D'-2"'="'='"'="). 


From Lemma 10 ( |vi| we have that D' —)• 0, so there is a to such that for all 
t > to we have D' < 2 -'=-'='-'=”. Thus A'D' -\- D'E' — B'C is negative for t > tg. 
Now lLemma 41 entails that the belief in ii increases. □ 




























Interestingly, [Theorem 11 does not hold for M since that would contradict 
Theorem 8| The reason is that there are quite short programs that produce a;<t, 
but do not halt after that. However, from p and x^t we cannot reconstruct t, 
hence a program for x<t does not give us a bound on K{t). 

Since we get the same bounds for Mnorm as in jLemiim ^ the result of jTheo-j 
jrem 7| transfers to Mnorm- 


Corollary 12 (Counterfactual Black Raven Disconfirms H). Let Xi-,oo 
be a computable infinite string sueh that Xi-,oo G H (xi-oo does not eontain any 
non-blaek ravens) and Xt BR infinitely often. Then there is a time step t G N 
(with Xt 7^ BR) such that \ X<^tBR) < Mnorm{H I X^t)- 


For incomputable infinite strings the belief in H can decrease infinitely often: 


Corollary 13 (Disconfirmation Infinitely Often for Mnorm)- There is an 
(incomputable) infinite string xi-,oo G H such that Mnorm{H \ xi:t) < Mnorm(,T[ \ 
x<t) infinitely often as t ^ oo. 


Proof. We iterate Corollary 12 starting with BR , we get a time step ti such 
that observing BR at time H disconfirms H. We set := BR ^ BR and 
apply Corollary 12 to Xi-,t^BR to get a time step ^2 such that observing BR at 
time t 2 disconfirms H. Then we set Xi-,t 2 '■= xi-t^BR ^ ^ BR, and so on. □ 


4.4 Stochastically Sampled Strings 

The proof techniques from the previous subsections do not generalize to strings 
that are sampled stochastically. The main obstacle is the complexity of counter- 
factual observations a;<ta with a xt'. for deterministic strings Km{x^to) —t 0, 
while for stochastically sampled strings Km{x^to) 0- Consider the following 
example. 

Example If (Uniform IID Observations). Let Xh be a measure that generates 
uniform i.i.d. symbols from {BR,BR,BR}. Formally, 


Xh{x) := 


0 


if BR S X, and 


3 otherwise. 


By construction, Xh{H) = 1. By Lemma 2 we ha ve A,C,E = and B,D = 
3“‘2-'"(*) with Aij-probability one. According to Lemma 4 the sign of AD + 
DE — BC is indicative for the change in belief in H. But this is inconclusive 
both for M and Mnorm since each of the summands AD, BC, and DE (in case 
A 7^ 0) go to zero at the same rate: 


AD A DE A BC A ^ 


Whether H gets confirmed or disconfirmed thus depends on the universal Turing 
machine and/or the probabilistic outcome of the string drawn from Xh- 0 


















5 Discussion 


We chose to present our results in the setting of the black raven problem to make 
them more accessible to intuition and more relatable to existing literature. But 
these results hold more generally: our proofs follow from the bounds on A, B, 
C, D, and E given in [Lemma 6| and [Lemma 10| These bounds rely on the fact 
that we are observing a computable inhnite string and that at any time step t 
there are programs consistent with the observation history that contradict the 
hypothesis and there are programs consistent with the observation history that 
are compatible with the hypothesis. No further assumptions on the alphabet, 
the hypothesis H, or the universal Turing machine are necessary. 

In our formalization of the raven problem given in [Section ~3l we used an 
alphabet with four symbols. Each symbol indicates one of four possible types 
of observations according to the two binary predicates blackness and ravenness. 
One could object that this formalization discards important structure from the 
problem: BR and BR have more in common than BR and BR, yet as symbols 
they are all the same. Instead, we could use the latin alphabet and spell out 
‘black’, ‘non-black’, ‘raven’, and ‘non-raven’. The results given in this paper 
would still apply analogously. 

Our result that Solomonoff induction does not satisfy Nicod’s criterion is 
not true for every time step, only for some of them. Generally, whether Nicod’s 
criterion should be adhered to depends on whether the paradoxical conclusion is 
acceptable. A different Bayesian reasoner might be tempted to argue that a green 
apple does confirm the hypothesis H, but only to a small degree, since there are 
vastly more non-black objects than ravens [5]. This leads to the acceptance of the 
paradoxical conclusion, and this solution to the confirmation paradox is known 
as the standard Bayesian solution. It is equivalent to the assertion that blackness 
is equally probable regardless of whether H holds: P(black|iJ) « P(black) [T^ . 
Whether or not this holds depends on our prior beliefs. 

The following is a very concise example against the standard Bayesian so¬ 
lution [3]: There are two possible worlds, the first has 100 black ravens and a 
million other birds, while the second has 1000 black ravens, one white raven, and 
a million other birds. Now we draw a bird uniformly at random, and it turns 
out to be a black raven. Contrary to what Nicod’s criterion claims, this is strong 
evidence that we are in fact in the second world, and in this world non-black 
ravens exist. 

For another, more intuitive example: Suppose you do not know anything 
about ravens and you have a friend who collects atypical objects. If you see a 
black raven in her collection, surely this would not increase your belief in the 
hypothesis that all ravens are black. 

We must conclude that violating Nicod’s criterion is not a fault of Solomonoff 
induction. Instead, we should accept that for Bayesian reasoning Nicod’s crite¬ 
rion, in its generality, is false! Quoting the great Bayesian master E. T. Jaynes [U 
p. 144]: 

In the literature there are perhaps 100 ‘paradoxes’ and controversies 

which are like this, in that they arise from faulty intuition rather than 








faulty mathematics. Someone asserts a general principle that seems to 
him intuitively right. Then, when probability analysis reveals the error, 
instead of taking this opportunity to educate his intuition, he reacts by 
rejecting the probability analysis. 
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List of Notation 


:= defined to be equal 

the cardinality of the set A, i.e., the number of elements 
X a hnite alphabet 

X* the set of all finite strings over the alphabet X 

X°° the set of all infinite strings over the alphabet X 

X'^ X"^ ■= X* U X°° ^ the set of all finite and infinite strings over the 

alphabet X 

the set of all finite and infinite strings that start with x 
X, y finite or infinite strings, x,y € X'^ 

X Q y the string a; is a prefix of the string y 
e the empty string 

e a small positive rational number 

t (current) time step 

n natural number 

K{x) Kolmogorov complexity of the string x: the length of the shortest 
program that prints x and halts 

m{t) the monotone lower bound on K, formally m{t) := min„>t K[n) 
Km{x) monotone Kolmogorov complexity of the string x: the length of the 
shortest program on the monotone universal Turing machine that 
prints something starting with x 

BR a symbol corresponding to the observation of a black raven 

BR a symbol corresponding to the observation of a non-black raven 

BR a symbol corresponding to the observation of a black non-raven 

BR a symbol corresponding to the observation of a non-black non-raven 

H the hypothesis ‘all ravens are black’, formally defined in ([^ 

U the universal (monotone) Turing machine 

M the Solomonoff prior 

Mnorm the normalized Solomonoff prior, defined according to 0 
p, q programs on the universal (monotone) Turing machine 


